Search CORE

125 research outputs found

The human phylome: a large scale phylogenetic study on the human genome evolution

Author: Huerta Cepas Jaime
Publication venue
Publication date: 01/01/2008
Field of study

Tesis doctoral inédita. Universidad Autónoma de Madrid, Facultad de Ciencias, Departamento de Biología Molecular. Fecha de lectura: 07-11-200

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Biblos-e Archivo

ETE: a python Environment for Tree Exploration

Author: Dopazo Joaquín
Gabaldón Toni
Huerta-Cepas Jaime
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Many bioinformatics analyses, ranging from gene clustering to phylogenetics, produce hierarchical trees as their main result. These are used to represent the relationships among different biological entities, thus facilitating their analysis and interpretation. A number of standalone programs are available that focus on tree visualization or that perform specific analyses on them. However, such applications are rarely suitable for large-scale surveys, in which a higher level of automation is required. Currently, many genome-wide analyses rely on tree-like data representation and hence there is a growing need for scalable tools to handle tree structures at large scale. Results Here we present the Environment for Tree Exploration (ETE), a python programming toolkit that assists in the automated manipulation, analysis and visualization of hierarchical trees. ETE libraries provide a broad set of tree handling options as well as specific methods to analyze phylogenetic and clustering trees. Among other features, ETE allows for the independent analysis of tree partitions, has support for the extended newick format, provides an integrated node annotation system and permits to link trees to external data such as multiple sequence alignments or numerical arrays. In addition, ETE implements a number of built-in analytical tools, including phylogeny-based orthology prediction and cluster validation techniques. Finally, ETE's programmable tree drawing engine can be used to automate the graphical rendering of trees with customized node-specific visualizations. Conclusions ETE provides a complete set of methods to manipulate tree data structures that extends current functionality in other bioinformatic toolkits of a more general purpose. ETE is free software and can be downloaded from <url>http://ete.cgenomics.org</url>.</p

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

MetaPhOrs: orthology and paralogy predictions from multiple phylogenetic evidence using a consistency-based confidence score

Author: Akaike
Altenhoff
Altschul
Byrne
Capella-Gutierrez
Chen
Datta
Edgar
Elsik
Fitch
Gabaldón
Gabaldón
Guindon
Huerta-Cepas
Huerta-Cepas
Huerta-Cepas
Jaime Huerta-Cepas
Kuzniar
Leszek P. Pryszcz
Marcet-Houben
Muller
Ruan
Tatusov
Toni Gabaldón
Vilella
Wallace
Wapinski
Wapinski
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

Reliable prediction of orthology is central to comparative genomics. Approaches based on phylogenetic analyses closely resemble the original definition of orthology and paralogy and are known to be highly accurate. However, the large computational cost associated to these analyses is a limiting factor that often prevents its use at genomic scales. Recently, several projects have addressed the reconstruction of large collections of high-quality phylogenetic trees from which orthology and paralogy relationships can be inferred. This provides us with the opportunity to infer the evolutionary relationships of genes from multiple, independent, phylogenetic trees. Using such strategy, we combine phylogenetic information derived from different databases, to predict orthology and paralogy relationships for 4.1 million proteins in 829 fully sequenced genomes. We show that the number of independent sources from which a prediction is made, as well as the level of consistency across predictions, can be used as reliable confidence scores. A webserver has been developed to easily access these data (http://orthology.phylomedb.org), which provides users with a global repository of phylogeny-based orthology and paralogy predictions

Crossref

PubMed Central

UPF Digital Repository

Phylemon: a suite of web tools for molecular evolution, phylogenetics and phylogenomics

Author: Arbiza Leonardo
Dopazo Hernán
Dopazo Joaquín
Gabaldón Toni
Huerta-Cepas Jaime
Medina Ignacio
Tárraga Joaquín
Publication venue: Oxford University Press
Publication date: 01/01/2007
Field of study

Phylemon is an online platform for phylogenetic and evolutionary analyses of molecular sequence data. It has been developed as a web server that integrates a suite of different tools selected among the most popular stand-alone programs in phylogenetic and evolutionary analysis. It has been conceived as a natural response to the increasing demand of data analysis of many experimental scientists wishing to add a molecular evolution and phylogenetics insight into their research. Tools included in Phylemon cover a wide yet selected range of programs: from the most basic for multiple sequence alignment to elaborate statistical methods of phylogenetic reconstruction including methods for evolutionary rates analyses and molecular adaptation. Phylemon has several features that differentiates it from other resources: (i) It offers an integrated environment that enables the direct concatenation of evolutionary analyses, the storage of results and handles required data format conversions, (ii) Once an outfile is produced, Phylemon suggests the next possible analyses, thus guiding the user and facilitating the integration of multi-step analyses, and (iii) users can define and save complete pipelines for specific phylogenetic analysis to be automatically used on many genes in subsequent sessions or multiple genes in a single session (phylogenomics). The Phylemon web server is available at http://phylemon.bioinfo.cipf.es

CiteSeerX

Crossref

PubMed Central

From genes to functional classes in the study of biological systems

Author: Al-Shahrour Fátima
Arbiza Leonardo
Dopazo Hernán
Dopazo Joaquín
Huerta-Cepas Jaime
Montaner David
Mínguez Pablo
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

BACKGROUND: With the popularisation of high-throughput techniques, the need for procedures that help in the biological interpretation of results has increased enormously. Recently, new procedures inspired in systems biology criteria have started to be developed. RESULTS: Here we present FatiScan, a web-based program which implements a threshold-independent test for the functional interpretation of large-scale experiments that does not depend on the pre-selection of genes based on the multiple application of independent tests to each gene. The test implemented aims to directly test the behaviour of blocks of functionally related genes, instead of focusing on single genes. In addition, the test does not depend on the type of the data used for obtaining significance values, and consequently different types of biologically informative terms (gene ontology, pathways, functional motifs, transcription factor binding sites or regulatory sites from CisRed) can be applied to different classes of genome-scale studies. We exemplify its application in microarray gene expression, evolution and interactomics. CONCLUSION: Methods for gene set enrichment which, in addition, are independent from the original data and experimental design constitute a promising alternative for the functional profiling of genome-scale experiments. A web server that performs the test described and other similar ones can be found at:

Crossref

Directory of Open Access Journals

PubMed Central

Fast Genome-Wide Functional Annotation through Orthology Assignment by eggNOG-Mapper

Author: Bork Peer
Coelho Luis Pedro
Forslund Kristoffer
Huerta-Cepas Jaime
Jensen Lars Juhl
Szklarczyk Damian
von Mering Christian
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/08/2017
Field of study

Orthology assignment is ideally suited for functional inference. However, because predicting orthology is computationally intensive at large scale, and most pipelines are relatively inaccessible (e.g. new assignments only available through database updates), less precise homology-based functional transfer is still the default for (meta-)genome annotation. We therefore developed eggNOG-mapper, a tool for functional annotation of large sets of sequences based on fast orthology assignments using precomputed clusters and phylogenies from the eggNOG database. To validate our method, we benchmarked Gene Ontology predictions against two widely used homology-based approaches: BLAST and InterProScan. Orthology filters applied to BLAST results reduced the rate of false positive assignments by 11%, and increased the ratio of experimentally validated terms recovered over all terms assigned per protein by 15%. Compared to InterProScan, eggNOG-mapper achieved similar proteome coverage and precision while predicting, on average, 41 more terms per protein and increasing the rate of experimentally validated terms recovered over total term assignments per protein by 35%. EggNOG-mapper predictions scored within the top-5 methods in the three Gene Ontology categories using the CAFA2 NK-partial benchmark. Finally, we evaluated eggNOG-mapper for functional annotation of metagenomics data, yielding better performance than interProScan. eggNOG-mapper runs ∼15x faster than BLAST and at least 2.5x faster than InterProScan. The tool is available standalone and as an online service at http://eggnog-mapper.embl.de

Copenhagen University Research Information System

MDC Repository

Evidence for systems-level molecular mechanisms of tumorigenesis

Author: Al-Shahrour Fátima
Capellá Gabriel
Dopazo Joaquín
Gómez Laia
Hernández Pilar
Huerta-Cepas Jaime
Montaner David
Pujana Miguel Angel
Valls Joan
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Cancer arises from the consecutive acquisition of genetic alterations. Increasing evidence suggests that as a consequence of these alterations, molecular interactions are reprogrammed in the context of highly connected and regulated cellular networks. Coordinated reprogramming would allow the cell to acquire the capabilities for malignant growth. Results Here, we determine the coordinated function of cancer gene products (i.e., proteins encoded by differentially expressed genes in tumors relative to healthy tissue counterparts, hereafter referred to as "CGPs") defined as their topological properties and organization in the interactome network. We show that CGPs are central to information exchange and propagation and that they are specifically organized to promote tumorigenesis. Centrality is identified by both local (degree) and global (betweenness and closeness) measures, and systematically appears in down-regulated CGPs. Up-regulated CGPs do not consistently exhibit centrality, but both types of cancer products determine the overall integrity of the network structure. In addition to centrality, down-regulated CGPs show topological association that correlates with common biological processes and pathways involved in tumorigenesis. Conclusion Given the current limited coverage of the human interactome, this study proposes that tumorigenesis takes place in a specific and organized way at the molecular systems-level and suggests a model that comprises the precise down-regulation of groups of topologically-associated proteins involved in particular functions, orchestrated with the up-regulation of specific proteins.</p

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Diposit Digital de la Universitat de Barcelona

PeroxisomeDB: a database for the peroxisomal proteome, functional genomics and disease

Author: Berthommier Guillaume
Domènech-Estévez Enric
Fourcade Stéphane
Gabaldón Toni
Huerta-Cepas Jaime
Poch Olivier
Pujol Aurora
Ripp Raymond
Schlüter Agatha
Wanders Ronald J. A.
Publication venue: Oxford University Press
Publication date: 28/11/2006
Field of study

Peroxisomes are essential organelles of eukaryotic origin, ubiquitously distributed in cells and organisms, playing key roles in lipid and antioxidant metabolism. Loss or malfunction of peroxisomes causes more than 20 fatal inherited conditions. We have created a peroxisomal database () that includes the complete peroxisomal proteome of Homo sapiens and Saccharomyces cerevisiae, by gathering, updating and integrating the available genetic and functional information on peroxisomal genes. PeroxisomeDB is structured in interrelated sections ‘Genes’, ‘Functions’, ‘Metabolic pathways’ and ‘Diseases’, that include hyperlinks to selected features of NCBI, ENSEMBL and UCSC databases. We have designed graphical depictions of the main peroxisomal metabolic routes and have included updated flow charts for diagnosis. Precomputed BLAST, PSI-BLAST, multiple sequence alignment (MUSCLE) and phylogenetic trees are provided to assist in direct multispecies comparison to study evolutionary conserved functions and pathways. Highlights of the PeroxisomeDB include new tools developed for facilitating (i) identification of novel peroxisomal proteins, by means of identifying proteins carrying peroxisome targeting signal (PTS) motifs, (ii) detection of peroxisomes in silico, particularly useful for screening the deluge of newly sequenced genomes. PeroxisomeDB should contribute to the systematic characterization of the peroxisomal proteome and facilitate system biology approaches on the organelle

Crossref

PubMed Central

Diposit Digital de la Universitat de Barcelona

eggNOG v4.0: nested orthology inference across 3686 organisms

Author: Bork Peer
Creevey Chris
Forslund Kristoffer
Gabaldón Toni
Huerta-Cepas Jaime
Jensen Lars J.
Kuhn Michael
Powell Sean
Rattei Thomas
Roth Alexander
Szklarczyk Damian
Trachana Kalliopi
von Mering Christian
Publication venue
Publication date: 02/08/2017
Field of study

With the increasing availability of various ‘omics data, high-quality orthology assignment is crucial for evolutionary and functional genomics studies. We here present the fourth version of the eggNOG database (available at http://eggnog.embl.de) that derives nonsupervised orthologous groups (NOGs) from complete genomes, and then applies a comprehensive characterization and analysis pipeline to the resulting gene families. Compared with the previous version, we have more than tripled the underlying species set to cover 3686 organisms, keeping track with genome project completions while prioritizing the inclusion of high-quality genomes to minimize error propagation from incomplete proteome sets. Major technological advances include (i) a robust and scalable procedure for the identification and inclusion of high-quality genomes, (ii) provision of orthologous groups for 107 different taxonomic levels compared with 41 in eggNOGv3, (iii) identification and annotation of particularly closely related orthologous groups, facilitating analysis of related gene families, (iv) improvements of the clustering and functional annotation approach, (v) adoption of a revised tree building procedure based on the multiple alignments generated during the process and (vi) implementation of quality control procedures throughout the entire pipeline. As in previous versions, eggNOGv4 provides multiple sequence alignments and maximum-likelihood trees, as well as broad functional annotation. Users can access the complete database of orthologous groups via a web interface, as well as through bulk downloa

RERO DOC Digital Library

eggNOG v4.0:Nested orthology inference across 3686 organisms

Author: Bork Peer
Creevey Chris
Forslund Kristoffer
Gabaldón Toni
Huerta-Cepas Jaime
Jensen Lars J.
Kuhn Michael
Powell Sean
Rattei Thomas
Roth Alexander
Szklarczyk Damian
Trachana Kalliopi
von Mering Christian
Publication venue
Publication date: 30/11/2013
Field of study

With the increasing availability of various 'omics data, high-quality orthology assignment is crucial for evolutionary and functional genomics studies. We here present the fourth version of the eggNOG database (available at http://eggnog.embl.de) that derives nonsupervised orthologous groups (NOGs) from complete genomes, and then applies a comprehensive characterization and analysis pipeline to the resulting gene families. Compared with the previous version, we have more than tripled the underlying species set to cover 3686 organisms, keeping track with genome project completions while prioritizing the inclusion of high-quality genomes to minimize error propagation from incomplete proteome sets. Major technological advances include (i) a robust and scalable procedure for the identification and inclusion of high-quality genomes, (ii) provision of orthologous groups for 107 different taxonomic levels compared with 41 in eggNOGv3, (iii) identification and annotation of particularly closely related orthologous groups, facilitating analysis of related gene families, (iv) improvements of the clustering and functional annotation approach, (v) adoption of a revised tree building procedure based on the multiple alignments generated during the process and (vi) implementation of quality control procedures throughout the entire pipeline. As in previous versions, eggNOGv4 provides multiple sequence alignments and maximum-likelihood trees, as well as broad functional annotation. Users can access the complete database of orthologous groups via a web interface, as well as through bulk download

CiteSeerX

Aberystwyth Research Portal

PubMed Central

Copenhagen University Research Information System

ZORA

UPF Digital Repository

MDC Repository